Collaborators: Tomo Tanaka, John Eykelenboom.

Cartoon project explanation

shiny app

1 Data

In our data we distinguish four states, marked by colour: blue, brown, pink, red. For the purpose of the model we merge blue and brown together into one state. Here is the result of the experiment from 55 cells.

The next figure shows the proportions of each colour as a function of time. The curves are smoothed with a running mean over the window of 5 time points.

2 Model

The model consists of three states and a set of rules.

2.1 States

  • blue/brown (B)
  • pink (P)
  • red (R)

2.2 Rules

  • Simulation is carried out at a discrete time step of 1 min
  • There is a fixed time for nuclear envelope breakdown, \(t_0 = 0\); \(t_0\) can be changed if necessary
  • Time \(t_1\) is a random variable with exponential distribution with time scale \(\tau_1\)
  • Time \(\Delta t_2\) is a random variable with exponential distribution with time scale \(\tau_2\)
  • Time \(\Delta t_3\) is a random variable with exponential distribution with time scale \(\tau_3\)
  • The cell is in state B before time \(t_0-t_1\).
  • B\(\rightarrow\)P occurs after \(t_0-t_1\) with rate \(k_1\)
  • P\(\rightarrow\)R occurs after \(t_0-t_1 + \Delta t_2\) with rate \(k_2\)*
  • P\(\rightarrow\)B occurs after \(t_0-t_1 + \Delta t_3\) with rate \(k_3\)

* A switch, \(t_{2, ref}\) was introduced to the model, selecting the reference for time \(t_2\). In the default position (\(t_{2, ref} = 1\)) the B\(\rightarrow\)P activation time is \(t_0-t_1 + \Delta t_2\). When the switch is set to \(t_{2, ref} = 0\), the activation time is \(t_0 + \Delta t_2\), that is, it can only occur after the nuclear envelope breakdown. The idea of the switch was to separate pink and red curves, as observed in some data sets.

I use a Markov chain approach. The next state is generated from the current state based on rules outlined above. The rates, \(k\), are converted into probabilities over a given time step \(\Delta t\) as \(Pr = 1 - e^{1 - k\Delta t}\). The transition time, \(t_1\), is generated before the simulation starts for a given cell.

The cell timeline is repeated for \(n_{\rm cell}\) times and then the colour proportions are found at each time point.

2.3 Example

Here is an example of the model.

And here is an example of the the model where P\(\rightarrow\)R transition can occur only after \(t_0 = 0\).

2.4 Parameter tuner

There is a shiny app that allows tuning parameters in search for the best solution.

3 Fitting model to the data

I attempted fitting model to our data. It is not an easy task, as the model is stochastic. After some testing I found that modified BFGS (a quasi-Newton method, also known as a variable metric algorithm, by Broyden, Fletcher, Goldfarb and Shanno, 1970) which allows box constraints (Byrd et al. 1995), gives reasonable results. Again, due to stochasticity of the model, this algorithm often finds false local minima. I run it several times to find the best minimum. It is a crude and time-consuming method, but gives results better than manual tuning.

Fitting is constrained to time points between -50 and 30 min. The minimized quantity is

\({\rm rms} = \sqrt{\sum_{c \in \{B,P,R\}} \sum_i (O_{c,i} - E_{c,i})^2}\)

that is, the square root of the sum of squared residuals over all time points and all colours.

3.1 Default model

First, we fit the model with \(\tau_1\), \(\tau_2\), \(k_1\) and \(k_2\) free. To improve the chances of finding the correct global minimum, I run the fit with 10,000 cells and 100 tries.

3.1.1 Scramble

We suspect that \(t_0\) might not be exactly zero. Hence I fit the model with \(t_0 = 0, -5, -10, -15\) minutes.

I think the issue here is not \(t_0\) but the fact that the red curve growths is too fast and the model cannot do it.

3.1.2 All conditions

Here are fit results for all conditions with \(t_0\) fixed at zero and four parameters free: \(\tau_1\), \(\tau_2\), \(k_1\) and \(k_2\). The RMS is computed over (-50, 30) min range (though the model is shown on the entire range).

3.2 Three parameters, \(\tau_2\) fixed at 8 min

3.3 RAD21

We want to see RAD21 with higher \(\tau_2\). Below, there is a comparison between the default 3-parameter fit (\(\tau_2\) fixed at 8 min) and a fit where \(\tau_2\) was fixed at 15 min.

3.4 Switch: P\(\rightarrow\)R only after \(t_0\)

Here are results from a modified model, where P\(\rightarrow\)R transition can happen only after \(t_0\). By default \(t_0 = 0\) and the results are shown in the left panels below. As you can see, the red curve in the model lags behind the red curve in the data. Hence, I fitted the data again, but fixing \(t_0 = -10\) min this time (right panels). This aligns the red curves a little better.

We can also see that the pink curve is more peaked, as opposed to the smooth rise and decay in the default model.

NOT READY.

3.5 Confidence intervals on fit parameters

It is not easy to find confidence intervals in fit parameters when not only the model is non-analytic, but also stochastic. The only feasible approach it bootstrap, which is computationally expensive. Here I give it a try.

3.5.1 Bootstraps (4 parameters)

Running bootstraps on the cluster. This is quite slow, so I do it bit by bit.

Here is the distribution of bootstrap result. Each bootstrap gives a set of fit parameters. Figures below show the distribution of fit parameters across all bootstraps.

And here are fit parameters and their 95% confidence intervals:

These are default box plots with whiskers encompassing 90% of data:

In this table we show the median value of each parameter:

set tau1 tau2 k1 k2
untreated 18.1 9.34 0.0654 0.0445
scramble 18.4 9.31 0.0726 0.087
NCAPD2 17.2 10.3 0.074 0.0195
NCAPD3 11 7.7 0.0394 0.0389
SMC2 10.7 7.68 0.0488 0.0207
RAD21 24 11.2 0.0777 0.0361
WAPL48 35.3 5.47 0.0082 0.00992
TT108 19.6 12.3 0.0746 0.0306

The next plot shows the correlation between \(\tau_2\) and \(k_2\) across bootstraps. The red line is a local regression line (LOESS).

3.5.2 Bootstraps (3 parameters)

Here I perform bootstrap analysis with fixed \(\tau_2=8\) min.

Here is the distribution of bootstrap result. Each bootstrap gives a set of fit parameters. Figures below show the distribution of fit parameters across all bootstraps.

And here are fit parameters and their 95% confidence intervals:

These are default box plots with whiskers encompassing 90% of data:

In this table we show the median value of each parameter:

set tau1 k1 k2
untreated 18.2 0.0645 0.042
scramble 18.1 0.0663 0.0632
NCAPD2 17 0.0759 0.0173
NCAPD3 11 0.0397 0.0376
SMC2 10.4 0.0496 0.0202
RAD21 23.8 0.0761 0.0307
WAPL48 39.2 0.00765 0.00889
TT108 19.9 0.0726 0.0267

4 Four-colour plots

Here I create cell-line plots for all data sets using four colours.

PDF files:

link
untreated
scramble
NCAPD2
NCAPD3
SMC2
WAPL24
WAPL48
MK1775
MK1775_ICRF193
RAD21
TT103
TT108

5 Brown density

In this Section we consider the distribution of brown or pink boxes. Hereafter, I call them “brown”, for simplicity, but we do take both brown and pink boxes into account.

Note on terminology: I refer to individual boxes in cell tracks as “boxes” or “points”. I refer to chains of consecutive boxes of the same type as “events”.

5.1 Point-to-point interarrival distributions

Here I calculate the density of brown boxes and the distribution of inter-arrival times between them. If brown points are distributed randomly with density \(\lambda\), the intervals between them should follow the exponential distribution with PDF

\(f(d) = \lambda e^{-\lambda d}\)

where \(d\) is the inter-arrival time between consecutive brown points. Because some of the cells have much shorter data tracks, they will bias the inter-arrival distribution towards shorter intervals. The longer distances are missing from these short tracks. To account for this I made a simple model, in which I take cell tracks from the actual data, fill them randomly with brown points and calculate inter-arrival distribution. This is done 100 times to smooth the distribution.

The example below shows the result for scramble. The black bars represent data, the orange points show the predicted theoretical distribution \(f(d) = \lambda e^{-\lambda d}\) and the open blue bars show the simulated random distribution. As we can see, the simulated distribution predicts more short intervals with respect to the theoretical distribution, as expected.

I use a window of [-300, -20] min.

This plot has limited interpretation. Inter-arrival time between individual one-minute boxes assumes that each box is one event. But in reality, brown event can last less or more than one minute. If we see, e.g., four brown boxes next to each other, it is unlikely that this shows four one-minute events. Much more likely, this is one event that lasts four minutes.

5.2 Brown event duration distribution

Instead, we should consider the distribution of full brown events, short and long. Below, I show the distribution of duration of brown events. Before the durations are calculated, I fill the occasional missing data with random brown or blue points, where brown probability is the same as the overall brown density in the data. This means that this figure is slightly random and the size of bars will change from one instance to another. The differences are rather small, though (no more than a few percent of the bar size).

The blue open bars show the expected distribution of duration of events derived from a random distribution of single brown points (noise). It shows that longer events (3 minutes or longer) are over-represented, by comparison with random noise. Brown occurrences cannot be explained by pure noise, they do form longer non-random “events”.

5.3 Brown event gap distribution

We can get more insight from the distribution of gaps between the brown events. Here I use the gaps between the events, not taking brown begin/end into account. For example, if “+” is brown/pink and “-” is blue (or red) then this cell track

---+--++++---+-

shows three brown events of length 1, 4, and 1 and two “gap” events of lengths 2 and 3. The blue sequences at the beginning and at the end are ignored, because we don’t know how long they are, we only see part of them.

The figure below shows the distribution of gaps between brows events. The simulated distribution (open bars) is calculated in the following way:

  • First, I fill missing data with random brown points with density \(\lambda\). Note that this is rather arbitrary and adds mostly one-minute points. On the other hand these missing points are not frequent, so the it doesn’t make huge difference.
  • I count the brown events of each length
  • I generate random data using the same cell track durations and the same brown events, distributed randomly
  • The process is repeated 100 times and the mean distribution of randomized data is shown
  • From this I find the distribution of gap length between events

If brown events were random, both data and simulated distributions would be similar. The p-value in the title shows the result from a chi-square test comparing data and simulation. The number “d” in each title is the density of brown boxes (\(\lambda\)).

As we can see, there is an excess of short gaps in comparison to a random distribution. This means that brown events tend to cluster together. In particular, the 1-min gaps are more prominent, suggesting that one brown event might trigger another, immediately after it. On the other hand, this is our timing resolution, so, to some extent, it might be an experimental artifact. For example, if we have a long brown event and one box in the middle is misidentified as blue, than it creates an artificial gap of size 1. If this happens a few times, we get the observed pattern.

Below, are results for all conditions. Note: each plot shows a p-value from a chi-square test between the data and the model shown. These p-values are typically very small. For clarity, I replaced each \(p < 10^{-16}\) with zero.

5.4 Distribution comparison

Here I compare event duration and gap duration distribution for each pair of conditions. I use chi-square test and the null hypothesis is that the distribution does not depend on the sample.

5.4.1 Point-to-point interval

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108
untreated
0.49 0.27 0.47 0.14 0.91 0.015 0.41 0.7 1e-05 0.76 0.22
scramble 0.49
0.0028 0.24 0.13 0.057 0.37 0.13 0.52 0.077 0.078 0.3
NCAPD2 0.27 0.0028
0.3 0.24 0.51 9.5e-05 0.69 0.27 3.1e-10 0.42 0.14
NCAPD3 0.47 0.24 0.3
0.85 0.35 0.015 0.22 0.67 4.5e-05 0.6 0.22
SMC2 0.14 0.13 0.24 0.85
0.72 0.0078 0.21 0.28 3e-07 0.59 0.12
WAPL24 0.91 0.057 0.51 0.35 0.72
0.03 0.72 0.97 1.9e-07 0.89 0.29
WAPL48 0.015 0.37 9.5e-05 0.015 0.0078 0.03
0.0047 0.28 0.21 0.073 0.091
MK1775 0.41 0.13 0.69 0.22 0.21 0.72 0.0047
0.5 4.4e-05 0.55 0.34
MK1775_ICRF193 0.7 0.52 0.27 0.67 0.28 0.97 0.28 0.5
0.0059 0.95 0.79
RAD21 1e-05 0.077 3.1e-10 4.5e-05 3e-07 1.9e-07 0.21 4.4e-05 0.0059
1.1e-06 0.00011
TT103 0.76 0.078 0.42 0.6 0.59 0.89 0.073 0.55 0.95 1.1e-06
0.43
TT108 0.22 0.3 0.14 0.22 0.12 0.29 0.091 0.34 0.79 0.00011 0.43

5.4.2 Event duration

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108
untreated
0.61 0.44 0.57 0.17 0.42 0.093 0.49 0.9 0.34 0.77 0.11
scramble 0.61
0.17 0.35 0.0054 0.19 0.66 0.21 0.46 0.089 0.045 0.074
NCAPD2 0.44 0.17
0.77 0.59 0.75 0.097 0.61 0.21 0.0017 0.43 0.42
NCAPD3 0.57 0.35 0.77
0.61 0.64 0.38 0.46 0.38 0.084 0.64 0.62
SMC2 0.17 0.0054 0.59 0.61
0.22 0.017 0.32 0.29 0.0025 0.35 0.87
WAPL24 0.42 0.19 0.75 0.64 0.22
0.099 0.49 0.62 0.025 0.34 0.3
WAPL48 0.093 0.66 0.097 0.38 0.017 0.099
0.19 0.16 0.057 0.015 0.44
MK1775 0.49 0.21 0.61 0.46 0.32 0.49 0.19
0.37 0.069 0.27 0.41
MK1775_ICRF193 0.9 0.46 0.21 0.38 0.29 0.62 0.16 0.37
0.38 0.5 0.31
RAD21 0.34 0.089 0.0017 0.084 0.0025 0.025 0.057 0.069 0.38
0.026 0.016
TT103 0.77 0.045 0.43 0.64 0.35 0.34 0.015 0.27 0.5 0.026
0.1
TT108 0.11 0.074 0.42 0.62 0.87 0.3 0.44 0.41 0.31 0.016 0.1

5.4.3 Gaps between events

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108
untreated
0.58 0.64 0.36 0.33 0.85 0.11 0.51 0.68 0.011 0.84 0.1
scramble 0.58
0.27 0.61 0.35 0.24 0.34 0.28 0.73 0.68 0.38 0.38
NCAPD2 0.64 0.27
0.6 0.57 0.75 0.3 0.67 0.55 0.081 0.49 0.34
NCAPD3 0.36 0.61 0.6
0.72 0.53 0.11 0.2 0.48 0.27 0.72 0.21
SMC2 0.33 0.35 0.57 0.72
0.47 0.18 0.3 0.33 0.035 0.66 0.13
WAPL24 0.85 0.24 0.75 0.53 0.47
0.47 0.69 0.88 0.024 0.79 0.21
WAPL48 0.11 0.34 0.3 0.11 0.18 0.47
0.25 0.79 0.52 0.71 0.48
MK1775 0.51 0.28 0.67 0.2 0.3 0.69 0.25
0.41 0.032 0.26 0.47
MK1775_ICRF193 0.68 0.73 0.55 0.48 0.33 0.88 0.79 0.41
0.5 0.84 0.84
RAD21 0.011 0.68 0.081 0.27 0.035 0.024 0.52 0.032 0.5
0.39 0.2
TT103 0.84 0.38 0.49 0.72 0.66 0.79 0.71 0.26 0.84 0.39
0.32
TT108 0.1 0.38 0.34 0.21 0.13 0.21 0.48 0.47 0.84 0.2 0.32

These tables show that there is very little discernible difference between conditions. Only TT103 differs from some other conditions in gap distribution. Otherwise, we have no evidence to reject the null hypothesis (but we cannot accept it either!).

5.5 S and late G2 data

5.5.1 Distribution comparison for point-to-point interval

WARNING: S and G2 data are calculated on a 2-min grid, while other sets are based on 1-min grid. To compare them, I bin their distributions from a smaller grid. I don’t bin raw data, as there is a problem of the reference frame. Binning 1-2, 3-4, 5-6, … will give different results than 2-3, 4-5, 6-7, …

The distributions are easier to compare. Consider a 1-min grid distribution where we have 50 1-min events and 20 2-min events. I bin them together into 50+20=70 events and compare to the number of 2-min events of S or G2 (which contains all events shorter than 2 minutes).

Depending on how the actual brown events are distributed (their true duration and timing) comparing data based on different time grids might introduce unpredictable biases.

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108 S_phase G2_phase
S_phase 9.3e-06 8e-12 0.043 0.00043 5.4e-05 0.011 1e-07 0.21 4.4e-05 5.9e-19 0.00035 0.00063
0.26
G2_phase 0.034 0.00016 0.21 0.067 0.011 0.021 6.3e-07 0.095 0.017 2.7e-12 0.69 0.096 0.36

5.5.2 Distribution comparison for event duration

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108 S_phase G2_phase
S_phase 0.017 0.051 7.3e-05 0.011 0.00023 0.0031 0.19 0.037 0.11 0.45 0.00046 0.017
0.34
G2_phase 0.0019 0.0015 1.3e-07 6e-05 1.6e-08 0.00029 0.038 0.0057 0.0093 0.033 2.6e-06 9.9e-05 0.45

5.5.3 Distribution comparison for gaps between events

untreated scramble NCAPD2 NCAPD3 SMC2 WAPL24 WAPL48 MK1775 MK1775_ICRF193 RAD21 TT103 TT108 S_phase G2_phase
S_phase 0.00053 1.4e-08 0.0073 0.0016 0.0011 0.011 0.0029 0.14 0.00035 1e-09 7.1e-05 0.00089
0.47
G2_phase 0.32 0.072 0.36 0.47 0.17 0.13 0.11 0.15 0.052 0.00062 0.33 0.18 0.15